Listening to Talking Faces 1 Running head: Listening to Talking Faces Listening to Talking Faces: Motor Cortical Activation During Speech Perception

نویسندگان

  • Jeremy I. Skipper
  • Howard C. Nusbaum
  • Steven L. Small
چکیده

Neurophysiological research suggests that understanding the actions of others harnesses neural circuits that would be used to produce those actions directly. We used fMRI to examine brain areas active during language comprehension in which the speaker was seen and heard while talking (audiovisual) or heard but not seen (audio-alone) or when the speaker was seen talking but the audio track removed (video-alone). We also examined brain areas active during speech production. We found that speech perception in the audiovisual, but not audio-alone or video-alone conditions activated a network of brain regions overlapping cortical areas involved in speech production and proprioception related to speech production. These regions included the posterior part of the superior temporal gyrus and sulcus, the superior portion of the pars opercularis, the dorsal aspect of premotor cortex, adjacent primary motor cortex, somatosensory cortex, and the cerebellum. Activity in dorsal premotor cortex and posterior superior temporal gyrus and sulcus was modulated by the amount of visually distinguishable phonemes in the stories. These results suggest that integrating observed facial movements into the speech perception process involves a network of brain regions associated with speech production. We suggest that this distributed network serves to represent the visual configuration of observed facial movements, the motor commands that could have been used to generate that configuration, and the associated expected auditory consequences of executing that hypothesized motor plan. These regions do not, on average, contribute to speech perception when in the presence of the auditory or visual signals alone. Listening to Talking Faces 3 Introduction Neurobiological models of language processing have traditionally assigned receptive and expressive language functions to anatomically and functionally distinct brain regions. This division originates from the observation that Broca’s aphasia, characterized by nonfluent spontaneous speech and fair comprehension, is the result of more anterior brain lesions than Wernicke’s aphasia, which is characterized by fluent speech and poor comprehension, and is the result of more posterior brain lesions (Geschwind, 1965). As with many such simplifications, the distinction between expressive and receptive language functions within the brain is not as straightforward as it may have appeared. Comprehension in Broca’s aphasia is almost never fully intact nor is production in Wernicke’s aphasia normal (Goodglass, 1993). Electrical stimulation of sites in both the anterior and posterior aspects of the brain can disrupt both speech production and speech perception (Ojemann, 1979; Penfield & Roberts, 1959). Neuroimaging studies have also shown that classically defined receptive and expressive brain regions are often both active in tasks that are specifically designed to investigate either perception or production (Braun et al., 2001; Buchsbaum et al., 2001; Papathanassiou et al., 2000). Recent neurophysiological evidence from nonhuman primates suggests an explanation for the observed interactions between brain regions traditionally associated with either language comprehension or production. The explanation requires examination of motor cortices and their role in perception. Regions traditionally considered to be responsible for motor planning and motor control appear to play a role in perception and comprehension of action (Graziano & Gandhi, 2000; Romanski & Goldman-Rakic, 2002). Certain neurons with visual and/or auditory and motor properties in these regions discharge both when an action is performed and during perception of another person performing the same action (Gallese et al., 1996; Kohler et al., 2002; Rizzolatti et al., 1996). In the macaque brain, these neurons reside in area F5 which is the proposed homologue of Broca’s area, the classic speech production region of the human (Rizzolatti et al., 2002). The existence of “mirror neurons” suggests the hypothesis that action observation aids action understanding via activation of similar or overlapping brain regions used in action performance. If this is the case, perhaps speech understanding, classically thought to be an auditory process (e.g., Fant, 1960), might be aided in the context of face-to-face interaction by cortical areas more typically associated with speech production. Seeing facial motor behaviors corresponding to speech production Listening to Talking Faces 4 (e.g., lip and mouth movements) might aid language understanding by recognition of the intended gesture within the motor system, thus further constraining possible interpretations of the intended message. Audiovisual Language Comprehension Most of our linguistic interactions evolved, develop, and occur in a setting of face-to-face interaction where multiple perceptual cues can contribute to determining the intended message. Although we are capable of comprehending auditory speech without any visual input (e.g., listening to the radio, talking on the phone), observation of articulatory movements produces significant effects on comprehension throughout the lifespan. Infants are sensitive to various characteristics of audiovisual speech (Kuhl & Meltzoff, 1982; Patterson & Werker, 2003). By adulthood, the availability of visual information about speech production significantly enhances recognition of speech sounds in a background of noise (Grant & Seitz, 2000; Sumby & Pollack, 1954) and improves comprehension even when the auditory speech signal is clear (Reisberg et al., 1987). Furthermore, incongruent audiovisual information can change the identity of a speech percept. For example, when an auditory /ba/ is dubbed onto the video of someone making mouth movements appropriate for production of /ga/, the resulting percept is usually /da/. We are susceptible to this audiovisual integrative illusion from early childhood through adulthood (Massaro, 1998; McGurk & MacDonald, 1976). Our experience as talkers and as listeners may associate the acoustic patterns of speech with motor planning and proprioceptive and visual information about accompanying mouth movements and facial expressions. Thus experience reinforces the relationships among acoustic, visual, and proprioceptive sensory patterns and between sensory patterns and motor control of articulation, so that speech becomes an “embodied signal”, rather than just an auditory signal. That is, information relevant to the phonetic interpretation of speech may derive partly from experience with articulatory movements that are generated by a motor plan during speech production. The mechanisms that mediate these associations could provide a neural account for some of the observed interactions between acoustic and visual information in speech perception that may not be apparent by studying acoustic speech perception alone. The participation of brain areas critical for language production during audiovisual speech perception has not been fully explored. It may be that the observed effects on speech comprehension produced by observation of a speaker’s face involves visual cortical areas or other multisensory areas Listening to Talking Faces 5 (e.g., posterior superior temporal sulcus), and not actual speech production areas. However, the evidence from nonhuman primates with regard to “mirror neurons” suggests that production centers in concert with other brain regions are likely candidates for the neural structures mediating these behavioral findings. Neurophysiological Studies of Audiovisual Language Relatively little is known about the neural structures mediating the comprehension of audiovisual language. This may be because when language comprehension is not viewed as modalityindependent, spoken language comprehension is seen as essentially an auditory process, and that it should be investigated as such in neuropsychological and brain imaging studies. However, visual input plays an important role in spoken language comprehension, a role that cannot be accounted for as solely a cognitive bias to categorize linguistic units according to visual characteristics when acoustic and visual information are discrepant (Green, 1998). Neuroimaging studies of speech processing incorporating both auditory and visual modalities are often focused on the problem of determining specific sites of multisensory integration (Calvert et al., 2000; Mottonen et al., 2002; Olson et al., 2002; Sams et al., 1991; Surguladze et al., 2001). Other studies have focused on only one (potential) component of audiovisual language comprehension, speech (i.e., lip) reading (Calvert et al., 1997; Calvert & Campbell, 2003; Campbell et al., 2001; Ludman et al., 2000; MacSweeney et al., 2000; MacSweeney et al., 2002a; MacSweeney et al., 2001; Surguladze et al., 2001). However, few studies have investigated the extent of the entire network of brain regions involved in audiovisual language comprehension overall (Callan et al., 2001; MacSweeney et al., 2002b). Nonetheless these experiments have collectively yielded a fairly consistent result: Audiovisual speech integration and perception produce activation of auditory cortices, predominantly posterior superior temporal gyrus and superior temporal sulcus. Though studies have reported activation in areas important for speech production (e.g., MacSweeney et al., 2002b), there has not been much theoretical interpretation of these activations. This may be in part because some studies use tasks that require an explicit motor response (e.g., Calvert et al., 1997; MacSweeney et al., 2002b; Olson et al., 2002), which limit the inferences that can be drawn about the role of motor areas in perception (Small & Nusbaum, In Press). However, it would be surprising if brain regions important for language production (e.g., Broca’s area and the precentral gyrus and sulcus) did not play a role in audiovisual speech perception, given the known connectivity between frontal and superior Listening to Talking Faces 6 temporal structures (Barbas & Pandya, 1989; Hackett et al., 1999; Petrides & Pandya, 1988, 2002; Romanski et al., 1999) and the multisensory sensitivity of these areas (Graziano & Gandhi, 2000; Kohler et al., 2002; Romanski & Goldman-Rakic, 2002) in nonhuman primates. In the present study, we used fMRI with a block design to investigate whether audiovisual language comprehension activates a network of brain regions that are also involved in speech production and whether this network is sensitive to visual characteristics of observed speech. We also investigated whether auditory language comprehension alone (without visual information about the mouth movements accompanying speech production) would activate the same motor regions, as it has long been proposed that speech perception (whether multimodal or unimodal) occurs by reference to the speech production system (e.g., Liberman & Mattingly, 1985). Finally, we investigated whether the visual observation of the mouth movements accompanying speech activate this network even without the speech signal. In an audio-alone condition (A), participants listened to spoken stories. In an audiovisual condition (AV), participants saw and heard the storyteller telling these stories. In the video-alone (V) condition, participants watched video clips of the storyteller telling these stories, but without the accompanying soundtrack. Participants were instructed to listen to and/or watch the stories attentively. No other instructions were given (e.g., in the V condition, participants were not overtly asked to speech read). Stories were approximately 20 seconds in duration. Finally, a second group of participants produced consonant-vowel syllables (S), in the scanner so that we could identify brain regions involved in phonetic speech production. The data from this group allows us to ascertain the overlap between the actual regions activated during speech production with those areas activated in the different conditions of language comprehension. Results Group Results The total brain volume of activation for the A and V conditions together accounted for only 8% of the variance of the total volume associated with the AV condition (F (1, 8) = .561, p = .4728). This suggests that when the auditory and visual modalities are presented together, emergent activation occurs. The emergent activation in the AV condition appears to be mostly in frontal areas and posterior superior temporal gyrus and sulcus (STG/STS). Indeed, relative to baseline (i.e., rest), the AV but not the A condition activated a network of brain regions involved in sensory and motor control and critical for speech production (see the Listening to Talking Faces 7 discussion for further details; Tables 1; Figure 1). These areas include the inferior frontal gyrus (IFG; BA 44 and 45), the precentral gyrus and sulcus (BA 4 and 6), the postcentral gyrus, and the cerebellum. Of these regions, the A condition activated only a cluster in the IFG (BA 45). In the direct statistical contrast of the AV and A conditions (AV-A), the AV condition produced greater activation in the IFG (BA 44, 45, and 47), the middle frontal gyrus (MFG), the precentral gyrus and sulcus (BA 4, 6, and 9), and the cerebellum, whereas the A condition produced greater activation in the superior frontal gyrus and inferior parietal lobule. The AV-V contrast showed that AV produced greater activation in all frontal areas with the exception of the MFG, superior parietal lobule, and the right IFG (BA 44) for which V produced greater activation. Relative to baseline, both the AV but not the A or V conditions activated more posterior aspects of the STG/STS (BA 22), a region previously associated with biological motion perception, multimodal integration, and speech production. Though both the AV and A conditions activated the STG/STS (BA 41/42/22) bilaterally, regions commonly associated with auditory language comprehension, activation in the AV condition was more extensive and extended more posterior from the transverse temporal gyrus than activation in the A condition. The AV-A and AV-V contrasts confirmed this pattern. The AV and V conditions activated cortices associated with visual processing (BA 18, 19, 20, and 21) and the A condition did not. However, the V condition only activated small clusters in the inferior occipital gyrus and the inferior temporal gyrus relative to baseline whereas the AV condition activated more extensive regions of occipital cortex as well as the left fusiform gyrus (BA 18). However, the AV-V contrast revealed that the AV condition produced greater activation in these areas in the left hemisphere whereas the V condition produced greater activation in these areas in the right hemisphere. -------------------------------------------------------------------------Insert Table 1 and Figure 1 about here -------------------------------------------------------------------------As a control, we examined the overlap between the neural networks for overt articulation and for speech perception by performing conjunction analyses of the A, AV, and V conditions with the S condition. A conjunction of the AV and S, A and S, and V and S conditions revealed common areas of overlap in regions related to auditory processing (BA 41, 42, 43, 22). Uniquely, the conjunction of AV and S, i.e., audiovisual speech perception and overt articulation, activated the inferior frontal gyrus (IFG; BA 44 and 45), the precentral gyrus and sulcus (BA 4 and 6), the postcentral gyrus, and more Listening to Talking Faces 8 posterior aspects of the STG/STS (posterior BA 22). The conjunction of the A or V with the S condition produced no significant overlap in these regions (Figure 2). -------------------------------------------------------------------------Insert Figure 2 about here -------------------------------------------------------------------------Because we were concerned that the high thresholds used to correct for multiple comparisons in imaging data could be responsible for the activation in the speech production areas in the AV but not A condition, we also examined activation patterns relative to baseline at a lower threshold (t(16) = 4, single voxel p = 0.001). At this low uncorrected threshold, the AV condition still had more activation than the A condition in the left IFG (especially in BA 44) and dorsal aspects of the left precentral gyrus (BA 4 and 6). The AV condition and not the A condition activated bilateral aspects of more posterior STG/STS, right IFG (BA 44), dorsal aspects of the right precentral gyrus, and the right cerebellum. In addition, the V condition showed a more robust pattern of activation, including the fusiform gyrus and IFG. Region-Of-Interest Results The group analysis is based on registering the different patterns of activity onto a single reference anatomy (Talairach & Tournoux, 1988). Despite its utility, this process can distort the details of individual anatomical structure, complicating accurate localization of activity and obscuring individual differences (Burton et al., 2001). To address this issue and to draw finer anatomical and functional distinctions, regions of interest (ROIs) were drawn onto each hemisphere of each participant’s high-resolution structural MRI scan. These ROIs were adapted from an MRI-based parcellation system (Caviness et al., 1996; Rademacher et al., 1992). The specific ROIs chosen, aimed to permit finer anatomical statements about differences between the AV and A conditions in the speech production areas, included the pars opercularis of the IFG (F3o), pars triangularis of the IFG (F3t), the dorsal two-thirds (PMd) and ventral one-third (PMv) of the precentral gyrus excluding primary motor cortex, the posterior aspect of the STG and the upper bank of the STS (T1p), and the posterior aspect of the supramarginal gyrus and the angular gyrus (SGp-AG). Table 2 describes the ROIs, their anatomical boundaries, and functional properties. We were particularly interested in F3o because the distribution of “mirror neurons” is hypothesized to be greatest in this area (Rizzolatti et al., 2002). Another ROI, the anterior aspect of the STG/STS (T1a), was drawn with the hypothesis that activation in this area would be more closely associated with processing of connected discourse (Humphries et al., 2001) and therefore would not differ between the Listening to Talking Faces 9 AV and A conditions. Finally, we included an ROI that encompassed the occipital lobe and temporaloccipital visual association cortex (including the lower bank of the posterior STS; TO2-OL) with the hypothesis that activity in this region would reflect visual processing and should not be active in the A condition. After delimiting these regions, we determined the total volume of activation within each ROI for each condition for each participant. We collected all voxels with a significant change in signal intensity for each task compared to baseline, i.e., voxels exceeding the threshold of z > 3.28, p < .001 corrected. To determine the difference between conditions, we compared the total volume of activation across participants for the AV and A conditions within each ROI using paired t-tests correcting for multiple comparisons (p < .004 unless otherwise stated). -------------------------------------------------------------------------Insert Table 2 about here -------------------------------------------------------------------------As in the group data, AV differed from A in volume of activation in a network of brain regions related to speech production. These regions included left PMd (t(8) = 5.19), right PMd (t(8) = 3,70), left F3o (t(8) = 4.06), left F3t (t(8) = 3.54), left T1p (t(8) = 4.12), and right T1p (t(8) = 4.45). There was no significant difference in the right F3o, right F3t, and bilateral SGp-AG. There were no significant differences in bilateral T1a, an area less closely associated with speech production and more closely associated with language comprehension. Finally, the AV and A conditions differed in the volume of activation in left TO2-OL (t(8) = 3.45), and right TO2-OL (t(8) = 3.74), areas associated primarily with visual processing. Viseme Results There were a variety of observable “non-linguistic” (e.g., head nods) and “linguistic” (e.g., place of articulation) movements produced by the talker in the AV condition. Some of the latter conveyed phonetic feature information, though most mouth movements by themselves are not sufficient for phonetic classification. However, a subset of visual speech movements, “visemes”, are sufficient (i.e., without the accompanying auditory modality) for phonetic classification. In this analysis we wished to determine if visemes, in contrast to other observable information about face and head movements in the AV stories, modulated activation in those motor regions that distinguish the AV from A conditions. This assesses whether the observed motor system activity was specific to perception of a specific aspect of motor behavior (i.e., speech production) on the part of the observed talker. That is, if the motor system activity is in service of understanding the speech, this Listening to Talking Faces 10 activity should be modulated by visual information that is informative about phonetic features and the presence of visemes within a story should relate to the amount of observed motor system activity. All stories were phonetically transcribed using the automated Oregon Speech Toolkit (Sutton et al., 1998) and the Center for Spoken Language Understanding (CSLU) labeling guide (Lander & Metzler, 1994). The proportion of visemes, derived from a prior list of visemes (Goldschen, 1993), relative to the total number of phonemes in each story was determined and stories were grouped into quartiles according the number of visemes. Stories in the first and fourth (t(6) = 23.97, p< .00001) and the first and third (t(6) = 13.86, p<.00001) quartiles significantly differed in the proportion of visemes. The volume and intensity of brain activity were compared in ROIs for the AV condition between the first and fourth and first and third viseme quartiles. As a control we also performed these comparisons for the A condition. The intensity of activity significantly increased when comparing the first and fourth quartiles in the AV condition in the same regions distinguishing the AV from A condition, regions that were also active during speech production as identified in our speech production control group. These were the right T1p (t(8) = 1.89 p< .05) and right PMd (t(8) = 2.81, p< .01). The first and third quartiles also differed for the AV condition in three areas, left T1p (t(8) = 3.38 p< .005), right T1p (t(8) = 4.26 p< .002), and right PMd (t(8) = 2.42 p< .02). None of these regions significantly differed for the A condition. In regions identified as being less closely related to speech production and more closely related to language comprehension, only left T1a (t(8) = 2.55 p< .02) for the AV condition and left T1a (t(8) = 2.79 p< .01) and right SGp-AG (t(8) = 2.21 p< .03) for the A condition differed when comparing the first and third quartiles. There were no differences between the first and second quartiles for either the AV or A conditions. Discussion The present results show that audiovisual language comprehension activates brain areas that are involved in both sensory and motor aspects of speech production. This is an extensive network that comprises Broca’s area (i.e., the pars opercularis) of the inferior frontal gyrus, the dorsal aspect of the precentral gyrus, including both premotor and adjacent primary motor cortices, the postcentral gyrus, and the cerebellum. In these areas, there was a paucity of activation in either audio-alone or visualalone conditions. For the auditory comprehension condition, only the post-central gyrus was activated Listening to Talking Faces 11 both in language comprehension and in speech production. For the visual condition there was some tendency for activation in the pars opercularis, which was active during speech production. Activation of speech production areas during audiovisual but not audio-alone language comprehension cannot be attributed simply to methodological considerations (e.g., an overly conservative correction for multiple comparisons). Nor are these results likely attributable to differences in speech comprehensibility across conditions. Participants understood and reported details of the stories in both AV and A conditions and there were no differences in these reports between conditions. The participants attended to and understood the stories in both conditions. In addition, previous research has shown that certain areas associated with language comprehension show greater activity with increasing difficulty of sentence comprehension (Just et al., 1996). If the A condition was more difficult to understand than the AV condition, then we would expect to see greater activity in these areas during the audio-alone condition, but we did not. When considering the results of the audio-only condition, we attribute the lack of activity in those cortical areas typically associated with speech production to the fact that under normal conditions listeners can process language solely in terms of its acoustic properties (cf. Klatt, 1979; Stevens & Blumstein, 1981) and thus may not need to recruit the motor system to understand speech (Liberman & Mattingly, 1985). This aspect of our results is consistent with previous functional imaging studies in which passive listening to auditory stimuli does not reliably elicit Broca’s area, premotor, or primary motor activation whereas overt phonetic decisions (among other overt tasks) do (for a review see Small & Burton, 2001). These tasks may engage parts of the brain involved in language production through covert rehearsal and/or working memory (e.g., Jonides et al., 1998; Smith & Jonides, 1999). However, in “normal” listening environments the production system is not normally involved (or is only weakly involved) in auditory language comprehension, although it certainly can be engaged. Our interpretation of our results is that audiovisual speech activates areas traditionally associated with both speech production and speech comprehension to encode observed facial movements and to integrate them into the overall process of understanding spoken language. This does not occur in the absence of the visual modality. When the visual modality is presented alone a subset of these processes may take place. In the following sections, we elaborate on this interpretation in relation to our results and to prior research in language comprehension and production. Listening to Talking Faces 12 Broca’s Area Broca’s area was significantly active during both audiovisual and audio-alone language comprehension. This activity was primarily restricted to the pars triangularis in the A condition. Broca’s area is traditionally viewed as supporting a mechanism by which phonological forms are coded into articulatory forms (Geschwind, 1965). It is commonly activated during both overt and covert speech production (Friederici et al., 2000; Grafton et al., 1997; Huang et al., 2001; Papathanassiou et al., 2000). However, results of production studies seem to suggest that Broca’s area is not itself involved in controlling articulation per se (Bookheimer et al., 1995; Huang et al., 2001; Wise et al., 1999), but may be a “pre-articulatory” region (Blank et al., 2002). In support of this, naming is interrupted in fewer than 36% of patients stimulated at the posterior aspect of the inferior frontal gyrus (Ojemann et al., 1989). Furthermore, lesions restricted to Broca’s area are clinically associated with Broca’s aphasia for only a few days (Knopman et al., 1983; Masdeu & O'Hara, 1983; Mohr et al., 1978) and the role of Broca’s area in producing Broca’s aphasia is unclear (Dronkers, 1996, 1998). Further supporting the notion that Broca’s area is not involved in controlling articulation per se is that activation in this area is not specific to oral speech as Broca’s area is activated during production of American Sign Language (Braun et al., 2001; Corina et al., 1999) and is activated by the observation and imitation of nonlinguistic but meaningful goal-directed movements (Binkofski et al., 2000; Ehrsson et al., 2000; Grezes et al., 1999; Hermsdorfer et al., 2001; Iacoboni et al., 1999; Koski et al., 2002). Nor does activation of Broca’s area in nonlinguistic domains simply represent covert verbal coding of the tasks given to subjects (Heiser et al., 2003). This review suggests the Broca’s area, though playing a role in speech production, is not simply a speech production area but rather, given its functional properties, is a general-purpose mechanism for relating (multimodal) perception and action. This review also suggests that refinements are necessary in the functional neuroanatomy of Broca’s area in both speech comprehension and production. We distinguished between the pars triangularis and the pars opercularis. We postulate that the common activation of the pars triangularis in both audiovisual and auditory language comprehension may reflect semantic or memory processing related to discourse comprehension in the two conditions (Devlin et al., 2003; Friederici et al., 2000; Gabrieli et al., 1998), and may not be related to cortical systems playing a role in speech production per se. However, as one moves more posterior along the gyrus (i.e., the pars opercularis), functions tend to be more closely related to production. Listening to Talking Faces 13 Broca’s Area: Pars Opercularis Our results indicate that AV language comprehension specifically activates the dorsal aspect of pars opercularis and that this activation overlaps that associated with speech production. Although there is no clear sulcal boundary between areas 44 and 45 (Amunts et al., 1999) we consider the pars opercularis a proxy for area 44 as this is where most of this cytoarchitectural region is located. The results of the ROI analysis confirm that activation was truly in the opercular region of individual subjects. Region 44 is the suggested homologue of macaque inferior premotor cortex (area F5), a region containing mirror neurons (Rizzolatti et al., 2002). Our results are consistent with the known properties of these neurons, namely that they fire upon perception (i.e., hearing and/ or seeing) and execution of particular types of goal-directed hand or mouth movements (Fadiga et al., 2000; Gallese et al., 1996; Kohler et al., 2002; Rizzolatti, 1987; Rizzolatti et al., 1996; Umilta et al., 2001). This result is also consistent with other neuroimaging evidence suggesting that the pars opercularis in the human has functional properties consistent with those of mirror neurons (Binkofski et al., 2000; Iacoboni et al., 1999; Koski et al., 2002). More specifically, our results are consistent with the claim that the dorsal aspect of the pars opercularis has more mirror-cell-like properties than the ventral portion, as the dorsal aspect is activated during both observation and imitation of goal-oriented actions where as the more ventral portion is activated during imitation only (Koski et al., 2002; MolnarSzakacs et al., 2002). From our results and the reviewed literature, we argue that neurons the dorsal aspect of the pars opercularis of Broca's area play a role in perception of articulatory gestures in audiovisual speech comprehension due to their mirror like properties. That is, we suggest that the perceived articulatory gesture is represented in this area as a tentative phonological simulation or hypothesis, rather than a parametric specification of the particular muscle movements necessary to produce them. The increased activation in this region during the V condition suggests that, in the absence of auditory perceptual contingencies or constraints, this area has to work harder to simulate or hypothesize movements. Dorsal Precentral Gyrus The observed precentral gyrus and sulcus activity occurred only in the AV condition. This activation was primarily in the dorsal aspect, and included both premotor (PMd) and adjacent primary motor cortex and did not include classically defined frontal eye fields (Geyer et al., 2000). Activation in this region overlapped with that occurring during speech production. In addition, activation in the Listening to Talking Faces 14 right PMd region was modulated by the amount of viseme content for the AV condition and in no other condition. These activation patterns are consistent with the hypothesized role of this area in human speech production. Stimulation of the PMd region has been shown to disrupt vocalization (Ojemann, 1979) and to do so more consistently than stimulation of the inferior frontal gyrus (Ojemann et al., 1989). In addition, this region has been shown to be more consistently active than the pars opercularis during overt speech production (Huang et al., 2001; Wise et al., 1999). Stimulation of these sites, however, does not produce speech arrest nearly to extent that occurs in the more ventral aspect of the precentral gyrus (Ojemann et al., 1989). PMd may serve to integrate the hypothesized articulatory goal of the observed talker (specified in the dorsal aspect of pars opercularis) with the actual motor commands that would be necessary to achieve that goal in production. This hypothesis has been previously suggested in the monkey literature with regard to integrating visual information about motor movements with motor commands (Halsband & Passingham, 1985). Similarly in humans it has been suggested that PMd is involved in selecting motor acts based on arbitrary visual or auditory stimuli (Grafton et al., 1998; Iacoboni et al., 1998; Kurata et al., 2000). Kurata et al. (2000) conclude that PMd plays an important role in conditional sensory-motor integration. We believe that modulation of activity in this region by the viseme content of the stories reflects the role of PMd in sensory-motor integration – as the ambiguity of the audiovisual signal decreases, there is concomitant increase in PMd, suggesting it is easier to derive a motor plan. This specificity is not apparent in the pars opercularis because there the level of representation is more abstract and multiple hypotheses are being represented. Finally, activation of adjacent primary motor cortex may be inhibitory or may reflect subthreshold activation of the motor plan generated in PMd. Superior Temporal Gyrus and Sulcus The superior temporal gyrus and sulcus posterior to primary auditory cortex, anterior to the supramarginal and angular gyri, were more active during the AV than in the A condition. This area was also active during speech production. Furthermore, during audiovisual comprehension, activity in this region significantly modulated by the amount of viseme content in the audiovisual stories, becoming more active as viseme content increased. Previous research has shown that damage to posterior superior temporal cortex results in a deficit in repeating speech (Anderson et al., 1999; Hickok, 2000) and stimulation of these sites results in speech production errors (Ojemann, 1979; Listening to Talking Faces 15 Ojemann et al., 1989). On the perceptual side, research indicates that the STS is activated by the observation of biologically relevant movements (such as the mouth movements in the present study) and by implied movements of the eyes, mouth, and hands (for a review see Allison et al., 2000). In addition, this area is activated to a greater extent by linguistically meaningful facial movements than to facial movements that do not have linguistic meaning (e.g., Campbell et al., 2001). In the present study, the activation that was produced by the presence of visemes is consistent with the sensitivity of this region to biologically relevant movements and specifically to speech movements. In addition, our finding is consistent with the interpretation that this area is a site participating in the integration of seen and heard speech (Calvert et al., 2000; Sams et al., 1991). In sum, the posterior superior temporal gyrus and sulcus seem to participate in both speech perception and production as a cortical convergence zone (Damasio et al., 1990) having auditory, visual, and motor properties. We offer an active “knowledge based” model of speech perception to account for these results (Nusbaum & Schwab, 1986). The posterior superior temporal gyrus and sulcus may provide an audiovisual description of speech actions that can interface with cortical areas that are important in speech production. We suggest it is through this interface that the pars opercularis and dorsal premotor cortex receive audio-visual pattern information about heard and observed speech movements. These areas, in turn, work to generate audio-visual-motor “hypotheses” consistent with the audio-visual properties. The winning hypothesis is generated in pre/primary motor cortex and sent back to the posterior superior temporal area to help constrain interpretation of a given linguistic segment. Such motor-to-sensory discharges have been found to occur during speech production (Paus et al., 1996a; Paus et al., 1996b). We suggest that such discharge also occurs during perception in the audiovisual context because speech is being (covertly) produced or modeled in the form of this hypothesis about what is heard and observed. This also suggests an explanation for the activation of the postcentral gyrus during audiovisual language comprehension as this area is associated with proprioceptive feedback related to speech production (Lotze et al., 2000). The somatotopy in this region corresponds to the mouth and tongue and is activated by tongue stimulation and movement (Cao et al., 1993; Sakai et al., 1995). This hypothesized movement may help constrain interpretation of a given linguistic segment by interacting with or by comparison with results of sensory processing in posterior STG/STS. This has been proposed for the imitation of manual movements where it was found that a “reafferent copy” of an actual movement interacts with observed movements in the STG/STS Listening to Talking Faces 16 (Iacoboni et al., 2001). During audiovisual speech perception, this interaction has the effect of lending support to a particular interpretation of a stretch of utterance. A Network for Audio–Visual–Motor Integration Taken together, we suggest that these areas form a network serving audio–visual–motor integration during language comprehension. This is an elaboration of the idea that there is a processing “stream” (e.g., posterior superior temporal and frontal cortex) for audio–motor integration that is active when a task, e.g., speech discrimination requires explicit decisions about phonetic segments (Hickok & Poeppel, 2000). Audiovisual speech perception may be one instance when this network is naturally active rather than driven by task demands. That is, activation of these regions is usually associated with explicit metalinguistic phonological judgments (Benson et al., 2001; Buchsbaum et al., 2001; Burton et al., 2000; Heim et al., 2003) or explicit articulation (Buchsbaum et al., 2001; Heim et al., 2003; Hickok et al., 2000; Paus et al., 1996a; Paus et al., 1996b; Wise et al., 2001). In the case of audiovisual language comprehension this network is naturally driven meaning that available cues are utilized to generate an action hypothesis regarding observed movements. This network is composed of posterior superior temporal cortex, the superior portion of the pars opercularis, the dorsal aspect of premotor cortex, adjacent motor cortex, the postcentral gyrus, and the cerebellum. Individually, each of these areas has been implicated in the production of speech and nonspeech related movements. (Though not reviewed above the cerebellum has a clear role in speech production (e.g., Riecker et al., 2000). Collectively, they are engaged in interpreting and acting in the audiovisual environment. This network may encode the visual configuration of observed facial movements (STG/STS), suggest an hypothesis pertaining to the abstracted goal of the observed production (pars opercularis), and select a motor plan that corresponds to that hypothesis (dorsal premotor cortex). This motor plan has auditory perceptual consequences that are realized through feedback from premotor and/or primary motor cortex back to the STG/STS. We suggest that this network operates in a goal-directed and context-dependent manner and is not necessarily engaged by input from the auditory or visual modalities alone. However, this network may be explicitly engaged (e.g., during a speech discrimination task or when attempting to speech read). In the presence of visual speech alone, when speech reading is unsuccessful (as in the V condition), a subset of this system may be engaged (i.e., the pars opercularis) to suggest hypotheses. Indeed, perhaps this area works harder to generate such hypotheses because they are not constrained by Listening to Talking Faces 17 the auditory signal. Actual production involves the same neural circuitry, differing only in the intensity of interactions among these areas and the added involvement of other regions, for example, ventral premotor or insular cortex. By this framework, audiovisual integration occurs in multiple brain regions over time and has a motor component. This is in contrast with previous views of audiovisual speech where integration occurs in the posterior superior temporal gyrus or sulcus with no motor component. Conclusions In the present study, we have shown that language comprehension, in the context of the mouth and face movements involved in speech production, activates a network of brain regions involved in audio–visual–motor integration. This network is not activated during audio-alone language processing and minimally activated by visual-alone viewing of oral-facial movements. However, language comprehension in an audiovisual context activates a network of speech production areas that are also activated during overt syllable production. Furthermore, this distributed network is sensitive to visual information about the phonetic structure of the stories. The brain regions comprising this network include posterior superior temporal cortex, the superior portion of the pars opercularis, the dorsal aspect of premotor cortex, the adjacent motor cortex, somatosensory cortex, and the cerebellum. This result is consistent with recent findings in macaques (e.g., Rizzolatti et al., 2002) and humans (e.g., Iacoboni et al., 1998) about the role of motor areas in action understanding. It also extends the data on human action understanding to the realm of speech and language, an important goal-directed behavior. With regard to language comprehension, we suggest that this distributed network serves to represent the visual configuration of observed facial movements, the motor commands that could have been used to generate that configuration, and the associated expected auditory consequences of executing that hypothesized motor plan. It is possible that activity within this network mediates the improvement in understanding of speech gained with audiovisual presentations. Methods Twenty-two participants were recruited from the student population of The University of Chicago and three participants were not used in the analysis because of technical problems during the experimental session (i.e., head movement, stimulus presentation failure, and failure to complete the scanning sequence). All participants were right handed as determined by the Edinburgh handedness inventory (Oldfield, 1971), had normal hearing, and normal uncorrected vision. The participants gave Listening to Talking Faces 18 written consent and the study was approved by the Biological Science Division’s Institutional Review Board of The University of Chicago. Nine participants (5 females; mean age = 25; SD = 8) formed an audiovisual language comprehension group with three conditions. In an audio-alone condition (A), participants listened to spoken stories. In an audiovisual condition (AV), participants watched and listened to high-resolution video clips of the storyteller, filmed from the neck up, telling stories. In a video-alone (V) condition, participants watched video clips of the storyteller telling these stories, but without the accompanying sound track. The stories were highly engaging and participants were simply asked to attend to them. Pre-testing indicated that listeners were interested in each story and could readily report details without special instruction to do so. No overt motor response was required. In all, participants were presented 28 stories told by a single storyteller. Story duration ranged from 18-24 seconds. The stories were repeated in each the AV, A, and V conditions and were counterbalanced. Audio stimuli were delivered to participants at 85 dB SPL through headphones containing MRI-compatible electromechanical transducers (Resonance Technologies, Inc., Northridge, CA). Participants viewed stimuli through a mirror that allowed them to see a screen at the end of the scanning bed. After completion of the scanning session, participants in the language comprehension group were interviewed about the stimuli and they reported understanding and being engaged by the stories. They answered specific questions about the events that occurred in the stories though they were not instructed that they would be doing so. All participants accurately described stories that they found interesting. Thirteen additional participants (6 females; mean age = 22; SD = 5) performed a control task of syllable production. Participants saw “pa”, “ka”, or “ta” on a screen for 1 second and were asked to begin repeating the sound at a normal rate and volume until the screen said “Stop”. Participants repeated “pa”, “ka”, or “ta” each for 12 seconds in 5 blocks for a total of 15 blocks. fMRI Acquisition, Registration, and Image Analysis For the comprehension group, scans were acquired on a 1.5 Tesla scanner whereas scans for the control speech production group were collected at 3 Tesla. Both groups used spiral acquisition (Noll et al., 1995) with a standard head coil and volumetric T1-weighted scans (124 axial slices, 1.5 x 0.938 x 0.938 mm resolution) were acquired to provide anatomical images on which landmarks could be found and on which functional activation maps could be superimposed. For the comprehension group, 24 6 mm spiral gradient echo T2* functional images were collected every 3 seconds in the axial plane. A Listening to Talking Faces 19 total of 224 whole brain images were collected in each of 4 runs. For the speech production group, 29 3.8 mm spiral gradient echo T2* functional images were collected every 1.5 seconds in the axial plane. A total of 244 whole brain images were collected. Images from both groups were spatially registered in three-dimensional space by Fourier transformation of each of the time points and corrected for head movement, using the AFNI software package (Cox, 1996). Image matrices were 128 X 128 X 24 and 128 X 128 X 29 for both groups respectively. This resulted in an effective in-plane resolution of 1.875 X 1.875 X 6 mm for the comprehension group and 1.875 X 1.875 X 3.8 mm for the production group. For both the comprehension task and the production task, individual participants’ functional imaging data were analyzed in "blocks" using a standard multiple linear regression analysis. Regressors were waveforms with similarity to the hemodynamic response, generated by convolving a gamma-variant function with the onset time and duration of the blocks of interest. For the comprehension group there were 3 such regressors for each the AV, A, and V conditions. For analysis of syllable production, there were 3 regressors for each of the produced syllables (i.e., PA, KA, and TA). The remaining regressors for both groups were the mean, linear and quadratic component of each of the functional runs. Analysis of the speech production data also incorporated 6 motion parameters, generated from the registration process, as regressors to control for talking-specific head motion. For the group analyses, anatomical and functional images were then interpolated to volumes with 2 mm voxels, co-registered, converted to Talairach stereotaxic coordinate space (Talairach & Tournoux, 1988), and smoothed (4 mm Gaussian full-width half-maximum filter) to decrease spatial noise. For both groups a group voxel-wise mixed-effects two-factor analysis of variance (condition x participant, where participants are considered a random sample) was applied to the percent signal change from baseline of the regression coefficients from the block analysis. Condition had 3 levels for the comprehension group (AV, A, and V) and three levels for production group (PA, KA, and TA). An activated voxel in either ANOVA was defined by an individual voxel probability less than 0.000002 and a minimum cluster connection radius of 2.1 and cluster size of 10 micro liters. These thresholds were established using the AlphaSim component of AFNI and are based on 10,000 Monte Carlo simulations and cluster size thresholding, resulting in an overall corrected significance level of alpha less than .05. This same threshold was also applied to statistical contrasts. The AV, A, or V conditions were compared to the syllable production condition using a conjunction analysis of activation resulting from the group ANOVAs. A conjunction analysis reveals Listening to Talking Faces20 brain activation common to a number of tasks and excludes activation in regions were activation wasabsent in one or more of these tasks (Friston et al., 1999; Price & Friston, 1997). Conjunctions wereperformed on whole-brain corrected images using the same cluster threshold as above, but thethreshold at the individual-voxel level was set at a P-value of less than √.000002 = 0.0014, to take intoaccount the reduced probability of type-I error (see Friston et al., 1999).For the individual region-of-interest (ROI) analysis, seven ROIs were hand-drawn on each ofthe participant’s anatomical images from the comprehension group. ROIs were based on an existingparcellation system (Caviness et al., 1996; Rademacher et al., 1992). Anatomical and functionaldescriptions of the regions are given in Table 2. The number of active voxels and their averagedintensities as defined by the block analysis were then extracted for each region of interest. Activevoxels were defined using a false discovery rate algorithm using the 3dFDR component of AFNI. Thisallowed us to set a corrected threshold p = .002. Paired t-tests corrected for multiple comparisonscomparing volume and averaged intensity of activation from each ROI were used to look at differencesbetween conditions.ReferencesAllison, T., Puce, A., & McCarthy, G. (2000). Social perception from visual cues: role of the STSregion. Trends Cogn Sci, 4(7), 267-278.Amunts, K., Schleicher, A., Burgel, U., Mohlberg, H., Uylings, H. B., & Zilles, K. (1999). Broca'sregion revisited: cytoarchitecture and intersubject variability. Journal of ComparativeNeurology, 412(2), 319-341.Anderson, J. M., Gilmore, R., Roper, S., Crosson, B., Bauer, R. M., Nadeau, S., et al. (1999).Conduction aphasia and the arcuate fasciculus: A reexamination of the Wernicke-Geschwindmodel. Brain Lang, 70(1), 1-12.Barbas, H., & Pandya, D. N. (1989). Architecture and intrinsic connections of the prefrontal cortex inthe rhesus monkey. J Comp Neurol, 286(3), 353-375.Benson, R. R., Whalen, D. H., Richardson, M., Swainson, B., Clark, V. P., Lai, S., et al. (2001).Parametrically dissociating speech and nonspeech perception in the brain using fMRI. BrainLang, 78(3), 364-396. Listening to Talking Faces21 Binkofski, F., Amunts, K., Stephan, K. M., Posse, S., Schormann, T., Freund, H. J., et al. (2000).Broca's region subserves imagery of motion: a combined cytoarchitectonic and fMRI study.Hum Brain Mapp, 11(4), 273-285.Blank, S. C., Scott, S. K., Murphy, K., Warburton, E., & Wise, R. J. (2002). Speech production:Wernicke, Broca and beyond. Brain, 125(Pt 8), 1829-1838.Bookheimer, S. Y., Zeffiro, T. A., Blaxton, T., Gaillard, W., & Theodore, W. (1995). Regionalcerebral blood flow during object naming and word reading. Human Brain Mapping, 3, 93-106.Braun, A. R., Guillemin, A., Hosey, L., & Varga, M. (2001). The neural organization of discourse: anH2 15O-PET study of narrative production in English and American sign language. Brain,124(Pt 10), 2028-2044.Buchsbaum, B., Hickok, G., & Humphries, C. (2001). Role of left posterior superior temporal gyrus inphonological processing for speech perception and production. Cognitive Science, 25, 663-678.Burton, M. W., Noll, D. C., & Small, S. L. (2001). The Anatomy of the Auditory Word Processing:Individual Variability. Brain and Language, 77(1), 119-131.Burton, M. W., Small, S. L., & Blumstein, S. E. (2000). The Role of Segmentation in PhonologicalProcessing: An fMRI Investigation. Journal of Cognitive Neuroscience, 12(4), 679-690.Callan, D. E., Callan, A. M., Kroos, C., & Vatikiotis-Bateson, E. (2001). Multimodal contribution tospeech perception revealed by independent component analysis: a single-sweep EEG casestudy. Brain Res Cogn Brain Res, 10(3), 349-353.Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C., McGuire, P. K., et al.(1997). Activation of auditory cortex during silent lipreading. Science, 276(5312), 593-596.Calvert, G. A., & Campbell, R. (2003). Reading speech from still and moving faces: the neuralsubstrates of visible speech. J Cogn Neurosci, 15(1), 57-70.Calvert, G. A., Campbell, R., & Brammer, M. J. (2000). Evidence from functional magnetic resonanceimaging of crossmodal binding in the human heteromodal cortex. Curr Biol, 10(11), 649-657.Campbell, R., MacSweeney, M., Surguladze, S., Calvert, G., McGuire, P., Suckling, J., et al. (2001).Cortical substrates for the perception of face actions: an fMRI study of the specificity ofactivation for seen speech and for meaningless lower-face acts (gurning). Brain Res CognBrain Res, 12(2), 233-243. Listening to Talking Faces22 Cao, Y., Towle, V. L., Levin, D. N., & Balter, J. M. (1993). Functional mapping of human motorcortical activation with conventional MR imaging at 1.5 T. J Magn Reson Imaging, 3(6), 869-875.Caviness, V. S., Meyer, J., Makris, N., & Kennedy, D. N. (1996). MRI-based topographic parcellationof human neocortex: An anatomically specified method with estimate of reliability. Journal ofCognitive Neuroscience, 8(6), 566-587.Chomsky, N., & Halle, M. (1968). The Sound Pattern of English. New York: Harper.Corina, D. P., McBurney, S. L., Dodrill, C., Hinshaw, K., Brinkley, J., & Ojemann, G. (1999).Functional roles of Broca's area and SMG: evidence from cortical stimulation mapping in adeaf signer. Neuroimage, 10(5), 570-581.Cox, R. W. (1996). AFNI: software for analysis and visualization of functional magnetic resonanceneuroimages. Comput Biomed Res, 29(3), 162-173.Damasio, A. R., Damasio, H., Tranel, D., & Brandt, J. P. (1990). Neural Regionalization of KnowledgeAccess: Preliminary Evidence. Paper presented at the Cold Spring Harbor Symposia onQuantitative Biology.Devlin, J. T., Matthews, P. M., & Rushworth, M. F. (2003). Semantic processing in the left inferiorprefrontal cortex: a combined functional magnetic resonance imaging and transcranial magneticstimulation study. J Cogn Neurosci, 15(1), 71-84.Dodd, B., & Campbell, R. (1987). Hearing by eye: the psychology of lipreading. London, England:Lawrence Erlbaum.Dronkers, N. F. (1996). A New Brain Region for Coordinating Speech Articulation. Nature,384(November 14, 1996), 159-161.Dronkers, N. F. (1998). Symposium: The role of Broca's area in language. Brain and Language, 65,71-72.Ehrsson, H. H., Fagergren, A., Jonsson, T., Westling, G., Johansson, R. S., & Forssberg, H. (2000).Cortical activity in precisionversus power-grip tasks: an fMRI study. J Neurophysiol, 83(1),528-536.Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (2000). Visuomotor neurons: ambiguity of thedischarge or 'motor' perception? Int J Psychophysiol, 35(2-3), 165-177.Fant, G. (1960). Acoustic Theory of Speech Perception. The Hague: Mouton. Listening to Talking Faces23 Friederici, A. D., Opitz, B., & von Cramon, D. Y. (2000). Segregating semantic and syntactic aspectsof processing in the human brain: an fMRI investigation of different word types. Cereb Cortex,10(7), 698-705.Friston, K. J., Holmes, A. P., Price, C. J., Buchel, C., & Worsley, K. J. (1999). Multisubject fMRIstudies and conjunction analyses. Neuroimage, 10(4), 385-396.Gabrieli, J. D., Poldrack, R. A., & Desmond, J. E. (1998). The role of left prefrontal cortex in languageand memory. Proc Natl Acad Sci U S A, 95(3), 906-913.Gallese, V., Fadiga, L., Fogassi, L., & Rizzolatti, G. (1996). Action recognition in the premotor cortex.Brain, 119 ( Pt 2), 593-609.Geschwind, N. (1965). Disconnection Syndromes in Animals and Man. Brain, 88, 237-294, 585-644.Geyer, S., Matelli, M., Luppino, G., & Zilles, K. (2000). Functional neuroanatomy of the primateisocortical motor system. Anat Embryol (Berl), 202(6), 443-474.Goldschen, A. J. (1993). Continuous Automatic Speech Recognition by Lipreading. GeorgeWashington University.Goodglass, H. (1993). Understanding Aphasia. San Diego, California: Academic Press.Grafton, S. T., Fadiga, L., Arbib, M. A., & Rizzolatti, G. (1997). Premotor cortex activation duringobservation and naming of familiar tools. Neuroimage, 6(4), 231-236.Grafton, S. T., Fagg, A. H., & Arbib, M. A. (1998). Dorsal premotor cortex and conditional movementselection: A PET functional mapping study. J Neurophysiol, 79(2), 1092-1097.Grant, K. W., & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection ofspoken sentences. J Acoust Soc Am, 108(3 Pt 1), 1197-1208.Graziano, M. S., & Gandhi, S. (2000). Location of the polysensory zone in the precentral gyrus ofanesthetized monkeys. Exp Brain Res, 135(2), 259-266.Green, K. (1998). The use of auditory and visual information during phonetic processing: implicationsfor theories of speech perception. In D. Burnham (Ed.), Hearing by Eye II: Advances in thePsychology of Speechreading and Auditory-visual Speech (pp. 3-25). Hove, UK: PsychologyPress.Grezes, J., Costes, N., & Decety, J. (1999). The effects of learning and intention on the neural networkinvolved in the perception of meaningless actions. Brain, 122 ( Pt 10), 1875-1887. Listening to Talking Faces24 Hackett, T. A., Stepniewska, I., & Kaas, J. H. (1999). Prefrontal connections of the parabelt auditorycortex in macaque monkeys. Brain Res, 817(1-2), 45-58.Halsband, U., & Passingham, R. E. (1985). Premotor cortex and the conditions for movement inmonkeys (Macaca fascicularis). Behav Brain Res, 18(3), 269-277.Heim, S., Opitz, B., Muller, K., & Friederici, A. D. (2003). Phonological processing during languageproduction: fMRI evidence for a shared production-comprehension network. Brain Res CognBrain Res, 16(2), 285-296.Heiser, M., Iacoboni, M., Maeda, F., Marcus, J., & Mazziotta, J. C. (2003). The essential role ofBroca's area in imitation. Eur J Neurosci, 17(5), 1123-1128.Hermsdorfer, J., Goldenberg, G., Wachsmuth, C., Conrad, B., Ceballos-Baumann, A. O., Bartenstein,P., et al. (2001). Cortical correlates of gesture processing: clues to the cerebral mechanismsunderlying apraxia during the imitation of meaningless gestures. Neuroimage, 14(1 Pt 1), 149-161.Hickok, G. (2000). Speech perception, conduction aphasia, and the functional neuroanatomy oflanguage. In D. Swinney (Ed.), Language and the Brain (pp. 87-104). San Diego: AcademicPress.Hickok, G., Erhard, P., Kassubek, J., Helms-Tillery, A. K., Naeve-Velguth, S., Strupp, J. P., et al.(2000). A functional magnetic resonance imaging study of the role of left posterior superiortemporal gyrus in speech production: implications for the explanation of conduction aphasia.Neurosci Lett, 287(2), 156-160.Hickok, G., & Poeppel, D. (2000). Towards a functional neuroanatomy of speech perception. TrendsCogn Sci, 4(4), 131-138.Huang, J., Carr, T. H., & Cao, Y. (2001). Comparing cortical activations for silent and overt speechusing event-related fMRI. Human Brain Mapping, 15, 39-53.Humphries, C., Willard, K., Buchsbaum, B., & Hickok, G. (2001). Role of anterior temporal cortex inauditory sentence comprehension: an fMRI study. Neuroreport, 12(8), 1749-1752.Iacoboni, M., Koski, L. M., Brass, M., Bekkering, H., Woods, R. P., Dubeau, M. C., et al. (2001).Reafferent copies of imitated actions in the right superior temporal cortex. Proc Natl Acad SciU S A, 98(24), 13995-13999. Listening to Talking Faces25 Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., & Rizzolatti, G. (1999).Cortical mechanisms of human imitation. Science, 286(5449), 2526-2528.Iacoboni, M., Woods, R. P., & Mazziotta, J. C. (1998). Bimodal (auditory and visual) leftfrontoparietal circuitry for sensorimotor integration and sensorimotor learning. Brain, 121 ( Pt11), 2135-2143.Jonides, J., Smith, E. E., Marshuetz, C., Koeppe, R. A., & Reuter-Lorenz, P. A. (1998). Inhibition inverbal working memory revealed by brain activation. Proc Natl Acad Sci U S A, 95(14), 8410-8413.Just, M. A., Carpenter, P. A., Keller, T. A., Eddy, W. F., & Thulborn, K. R. (1996). Brain activationmodulated by sentence comprehension. Science, 274(5284), 114-116.Klatt, D. H. (1979). Speech perception: A model of acoustic-phonetic analysis and lexical access. In R.A. Cole (Ed.), Perception and production of fluent speech. Hillsdale, NJ: Lawrence ErlbaumAssociates, Ltd.Knopman, D. S., Selnes, O. A., Niccum, N., Rubens, A. B., Yock, D., & Larson, D. (1983). Alongitudinal study of speech fluency in aphasia: CT correlates of recovery and persistentnonfluency. Neurology, 33(9), 1170-1178.Kohler, E., Keysers, C., Umilta, M. A., Fogassi, L., Gallese, V., & Rizzolatti, G. (2002). Hearingsounds, understanding actions: action representation in mirror neurons. Science, 297(5582),846-848.Koski, L., Wohlschlager, A., Bekkering, H., Woods, R. P., Dubeau, M. C., Mazziotta, J. C., et al.(2002). Modulation of motor and premotor activity during imitation of target-directed actions.Cereb Cortex, 12(8), 847-855.Kuhl, P. K., & Meltzoff, A. N. (1982). The bimodal perception of speech in infancy. Science,218(4577), 1138-1141.Kurata, K., Tsuji, T., Naraki, S., Seino, M., & Abe, Y. (2000). Activation of the dorsal premotor cortexand pre-supplementary motor area of humans during an auditory conditional motor task. JNeurophysiol, 84(3), 1667-1672.Lander, T., & Metzler, S. T. (1994). The CSLU labeling guide. Oregon.Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition,21(1), 1-36. Listening to Talking Faces26 Lotze, M., Seggewies, G., Erb, M., Grodd, W., & Birbaumer, N. (2000). The representation ofarticulation in the primary sensorimotor cortex. Neuroreport, 11(13), 2985-2989.Ludman, C. N., Summerfield, A. Q., Hall, D., Elliott, M., Foster, J., Hykin, J. L., et al. (2000). Lip-reading ability and patterns of cortical activation studied using fMRI. Br J Audiol, 34(4), 225-230.MacSweeney, M., Amaro, E., Calvert, G. A., Campbell, R., David, A. S., McGuire, P., et al. (2000).Silent speechreading in the absence of scanner noise: an event-related fMRI study.Neuroreport, 11(8), 1729-1733.MacSweeney, M., Calvert, G. A., Campbell, R., McGuire, P. K., David, A. S., Williams, S. C., et al.(2002a). Speechreading circuits in people born deaf. Neuropsychologia, 40(7), 801-807.MacSweeney, M., Campbell, R., Calvert, G. A., McGuire, P. K., David, A. S., Suckling, J., et al.(2001). Dispersed activation in the left temporal cortex for speech-reading in congenitally deafpeople. Proc R Soc Lond B Biol Sci, 268(1466), 451-457.MacSweeney, M., Woll, B., Campbell, R., McGuire, P. K., David, A. S., Williams, S. C., et al.(2002b). Neural systems underlying British Sign Language and audio-visual English processingin native users. Brain, 125(Pt 7), 1583-1593.Masdeu, J. C., & O'Hara, R. J. (1983). Motor aphasia unaccompanied by faciobrachial weakness.Neurology, 33(4), 519-521.Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle.Cambridge, Massachusetts: MIT Press.McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264(5588), 746-748.Mohr, J. P., Pessin, M. S., Finkelstein, S., Funkenstein, H. H., Duncan, G. W., & Davis, K. R. (1978).Broca aphasia: pathologic and clinical. Neurology, 28(4), 311-324.Molnar-Szakacs, I., Iacoboni, M., Koski, L., Maeda, F., Dubeau, M. C., Aziz-Zadeh, L., et al. (2002).Action observation in the pars opercularis: evidence from 58 subjects studied with fMRI. Paperpresented at the Cognitive Neuroscience Society, San Francisco.Mottonen, R., Krause, C. M., Tiippana, K., & Sams, M. (2002). Processing of changes in visual speechin the human auditory cortex. Brain Res Cogn Brain Res, 13(3), 417-425.Noll, D. C., Cohen, J. D., Meyer, C. H., & Schneider, W. (1995). Spiral k-space MRI imaging ofcortical activation. Journal of Magnetic Resonance Imaging, 5, 49-56. Listening to Talking Faces27 Nusbaum, H. C., & Schwab, E. C. (1986). The role of attention and active processing in speechperception. In E. C. Schwab & H. C. Nusbaum (Eds.), Pattern Recognition by Humans andMachines: Volume 1. Speech Perception (pp. 113-157). New York: Academic Press.Ojemann, G. (1979). Individual variability in cortical localization of language. Journal ofNeurosurgery, 50, 164-169.Ojemann, G., Ojemann, J., Lettich, E., & Berger, M. (1989). Cortical Language Localization in Left,Dominant Hemisphere: An Electrical Stimulation Mapping Investigation in 117 Patients.Journal of Neurosurgery, 71, 316-326.Oldfield, R. C. (1971). The Assessment and Analysis of Handedness: The Edinburgh Inventory.Neuropsychologia, 9, 97-113.Olson, I. R., Gatenby, J. C., & Gore, J. C. (2002). A comparison of bound and unbound audio-visualinformation processing in the human cerebral cortex. Brain Res Cogn Brain Res, 14(1), 129-138.Papathanassiou, D., Etard, O., Mellet, E., Zago, L., Mazoyer, B., & Tzourio-Mazoyer, N. (2000). Acommon language network for comprehension and production: a contribution to the definitionof language epicenters with PET. Neuroimage, 11(4), 347-357.Patterson, M. L., & Werker, J. F. (2003). Two-month old infants match phonetic information in lipsand voice. Developmental Science, 6(2), 193-198.Paus, T., Marrett, S., Worsley, K., & Evans, A. (1996a). Imaging motor-to-sensory discharges in thehuman brain: an experimental tool for the assessment of functional connectivity. Neuroimage,4(2), 78-86.Paus, T., Perry, D. W., Zatorre, R. J., Worsley, K. J., & Evans, A. C. (1996b). Modulation of cerebralblood flow in the human auditory cortex during speech: role of motor-to-sensory discharges.Eur J Neurosci, 8(11), 2236-2246.Penfield, W., & Roberts, L. (1959). Speech and Brain Mechanisms. Princeton: Princeton UniversityPress.Petrides, M., & Pandya, D. N. (1988). Association fiber pathways to the frontal cortex from thesuperior temporal region in the rhesus monkey. J Comp Neurol, 273(1), 52-66. Listening to Talking Faces28 Petrides, M., & Pandya, D. N. (2002). Comparative cytoarchitectonic analysis of the human and themacaque ventrolateral prefrontal cortex and corticocortical connection patterns in the monkey.Eur J Neurosci, 16(2), 291-310.Price, C. J., & Friston, K. J. (1997). Cognitive conjunction: a new approach to brain activationexperiments. Neuroimage, 5(4 Pt 1), 261-270.Rademacher, J., Galaburda, A. M., Kennedy, D. N., Filipek, P. A., & Caviness, V. S. (1992). Humancerebral cortex: localization, parcellation, and morphometry with magnetic resonance imaging.Journal of Cognitive Neuroscience, 4(4), 352-374.Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: a lipreadingadvantage with intact auditory stimuli. In R. Campbell (Ed.), Hearing by Eye: The Psychologyof Lipreading (pp. 97-114). Hillsdale, NJ: Erlbaum.Riecker, A., Ackermann, H., Wildgruber, D., Dogil, G., & Grodd, W. (2000). Opposite hemisphericlateralization effects during speaking and singing at motor cortex, insula and cerebellum.Neuroreport, 11(9), 1997-2000.Rizzolatti, G. (1987). Functional organization of inferior area 6. Ciba Found Symp, 132, 171-186.Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition ofmotor actions. Brain Res Cogn Brain Res, 3(2), 131-141.Rizzolatti, G., Fogassi, L., & Gallese, V. (2002). Motor and cognitive functions of the ventral premotorcortex. Curr Opin Neurobiol, 12(2), 149-154.Romanski, L. M., Bates, J. F., & Goldman-Rakic, P. S. (1999). Auditory belt and parabelt projectionsto the prefrontal cortex in the rhesus monkey. J Comp Neurol, 403(2), 141-157.Romanski, L. M., & Goldman-Rakic, P. S. (2002). An auditory domain in primate prefrontal cortex.Nat Neurosci, 5(1), 15-16.Sakai, K., Watanabe, E., Onodera, Y., Itagaki, H., Yamamoto, E., Koizumi, H., et al. (1995).Functional mapping of the human somatosensory cortex with echo-planar MRI. Magn ResonMed, 33(5), 736-743.Sams, M., Aulanko, R., Hamalainen, M., Hari, R., Lounasmaa, O. V., Lu, S. T., et al. (1991). Seeingspeech: visual information from lip movements modifies activity in the human auditory cortex.Neurosci Lett, 127(1), 141-145. Listening to Talking Faces29 Small, S. L., & Burton, M. W. (2001). Functional Neuroimaging of Language. In R. S. Berndt (Ed.),Handbook of Neuropsychology Second Edition, Volume 3: Language and Aphasia (Vol. 3, pp.335-351). Amsterdam: Elsevier.Small, S. L., & Nusbaum, H. C. (In Press). On the neurobiological investigation of languageunderstanding in context. Brain and Language.Smith, E. E., & Jonides, J. (1999). Storage and executive processes in the frontal lobes. Science,283(5408), 1657-1661.Stevens, K. N., & Blumstein, S. E. (1981). The search for invariant acoustic correlates of phoneticfeatures. In P. Eimas & J. Miller (Eds.), Perspectives on the Study of Speech. Hillsdale, NJ:Lawrence Erlbaum.Sumby, W. H., & Pollack, I. (1954). Visual contribution of speech intelligibility in noise. The Jounalof the Acoustical Society of America, 26(2), 212-215.Surguladze, S. A., Calvert, G. A., Brammer, M. J., Campbell, R., Bullmore, E. T., Giampietro, V., etal. (2001). Audio-visual speech perception in schizophrenia: an fMRI study. Psychiatry Res,106(1), 1-14.Sutton, S., Cole, R. A., de Villiers, J., Schalkwyk, J., Vermeulen, P., Macon, M., et al. (1998,November). Universal Speech Tools: the CSLU Toolkit. Paper presented at the Proceedings ofthe International Conference on Spoken Language Processing, Sydney, Australia.Talairach, J., & Tournoux, P. (1988). Co-Planar Stereotaxic Atlas of the Human Brain: 3DProportional System: An Approach to Cerebral Imaging. New York, New York: Georg ThiemeVerlag.Umilta, M. A., Kohler, E., Gallese, V., Fogassi, L., Fadiga, L., Keysers, C., et al. (2001). I know whatyou are doing. a neurophysiological study. Neuron, 31(1), 155-165.Wise, R. J., Greene, J., Buchel, C., & Scott, S. K. (1999). Brain regions involved in articulation.Lancet, 353(9158), 1057-1061.Wise, R. J., Scott, S. K., Blank, S. C., Mummery, C. J., Murphy, K., & Warburton, E. A. (2001).Separate neural subsystems within 'Wernicke's area'. Brain, 124(Pt 1), 83-95. Listening to Talking Faces30 Table 1Location, center of mass, and amount of cortical activity in significant (t(16) = 7.2, single voxel p =0.000002, p < .05 corrected) clusters produced by the audiovisual (AV), audio-alone (A), and video-alone (V) conditions for the group. Table 2Regions of interest (ROIs) and their anatomical and functional characteristics. Listening to Talking Faces31 Figure CaptionsFigure 1. Cortical activation produced by the audiovisual, audio-alone, and video-alone conditions forthe group (t(16) = 7.2, single voxel p = 0.000002, p < .05 corrected). Activation, in red, is projectedonto the surface of the brain of a single subject. Figure 2. Conjunction (overlap) analyses of the audiovisual and speaking and audio-alone andspeaking conditions for the groups. Conjunctions were performed on whole-brain corrected imagesusing an individual-voxel p < .001. Activation, in red, is displayed on single sagittal slices from eachthe left (LH) and right (RH) hemispheres from the averaged volume of the nine participants in thelanguage comprehension group.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Listening to talking faces: motor cortical activation during speech perception.

Neurophysiological research suggests that understanding the actions of others harnesses neural circuits that would be used to produce those actions directly. We used fMRI to examine brain areas active during language comprehension in which the speaker was seen and heard while talking (audiovisual) or heard but not seen (audio-alone) or when the speaker was seen talking with the audio track remo...

متن کامل

SHORT REPORT Detecting ‘infant-directedness’ in face and voice

Fiveand 3-month-old infants’ perception of infant-directed (ID) faces and the role of speech in perceiving faces were examined. Infants’ eye movements were recorded as they viewed a series of two side-by-side talking faces, one infant-directed and one adultdirected (AD), while listening to ID speech, AD speech, or in silence. Infants showed consistently greater dwell time on ID faces vs. AD fac...

متن کامل

Selective Attention Enhances Beta-Band Cortical Oscillation to Speech under “Cocktail-Party” Listening Conditions

Human listeners are able to selectively attend to target speech in a noisy environment with multiple-people talking. Using recordings of scalp electroencephalogram (EEG), this study investigated how selective attention facilitates the cortical representation of target speech under a simulated "cocktail-party" listening condition with speech-on-speech masking. The result shows that the cortical ...

متن کامل

On the Evaluation of the Conversational Speech Quality in Telecommunications

We propose an objective method to assess speech quality in the conversational context by taking into account the talking and listening speech qualities and the impact of delay. This approach is applied to the results of four subjective tests on the effects of echo, delay, packet loss, and noise. The dataset is divided into training and validation sets. For the training set, a multiple linear re...

متن کامل

Auditory-visual perception of talking faces at birth: a new paradigm

Newborn infants prefer faces to all other visual displays. All previous studies of face recognition in newborns used schematic faces, photographs or static real faces. In our study, we used video films to explore, for the first time, recognition of talking faces, in newborns. Our results suggest that video films of talking faces are very salient stimuli for newborns and can enhance face recogni...

متن کامل

Realistic Face Animation for a Czech Talking Head

This paper is focused on improving visual Czech speech synthesis. Our aim was the design of a highly natural and realistic talking head with a realistic 3D face model, improved co-articulation, and a realistic model of inner articulatory organs (teeth, the tongue and the palate). Besides very good articulation our aim was also expression of the mimic and emotions of the talking head. The intell...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003